Shotgun Scanning for Sugar Mutations: A Simple and Effective Strategy for Constructing and Characterizing New Glycoproteins | NASA

2021-11-16 18:32:35 By : Mr. GUANGSHAN LI

View all hidden authors and organizations

Edited by Ronald T. Raines, Massachusetts Institute of Technology, Cambridge, Massachusetts, and accepted by Susan Marqusee, a member of the editorial board, on August 13, 2021 (April 20, 2021)

Asparagine-linked (N-linked) protein glycosylation—the covalent attachment of a complex sugar to a nitrogen atom in the asparagine side chain—is the most common and most complex post-translational modification of proteins. N-glycosylation affects a large number of cellular proteins and has a profound impact on its most important properties, such as biological activity, chemical solubility, folding and stability, immunogenicity, and serum half-life. Therefore, the strategic installation of glycans at the initial site has become an attractive means of imparting beneficial biological and/or biophysical properties to proteins. Here, we describe a glycoprotein engineering strategy that can systematically study the structural and functional consequences of glycan installation along each position of the protein backbone and provide a new way to customize glycoproteins.

As a common protein modification, asparagine-linked (N-linked) glycosylation can greatly affect the biological and biophysical properties of proteins. However, the conventional use of glycosylation as a strategy for engineered proteins with advantageous properties is limited by our inability to construct and screen large numbers of glycoproteins to catalog the consequences of glycan installation. To deal with this challenge, we describe a combinatorial strategy called shotgun scanning sugar mutations, in which a DNA library encoding all possible glycosylation site variants of a given protein is constructed, and then a DNA library with glycosylation capability is constructed. Expressed in bacteria, so that glycosylation sites in proteins can be quickly determined. The resulting new glycoproteins can be easily subjected to available high-throughput detection, so that the structural and functional results of the conjugation of glycans along the protein backbone can be systematically studied. Three different receptor proteins prove the utility of this method, namely bacterial immune protein Im7, bovine pancreatic ribonuclease A and human anti-HER2 single-chain Fv antibody, all of which have been found to tolerate large amounts of N-glycans Attached positions and relatively high efficiency. The stability and activity of many sugar variants are significantly altered by N-linked glycans in a way that depends heavily on the precise location of the modification. The structural model shows that by establishing a new interface contact with glycans on the periphery of the protein-protein interface, the affinity can be improved. Importantly, we expect that our glycomutation workflow should have access to unexplored glycoprotein structural spatial regions and custom new glycoproteins with desirable characteristics.

Glycosylation of asparagine residues is one of the most abundant and complex protein post-translational modifications (1, 2) and occurs in all areas of life (3). Due to its relatively large size and hydrophilicity or only in specific locations, asparagine-linked (N-linked) glycans can significantly change protein properties, including biological activity, chemical solubility, folding and stability, and immunity Originality and serum half-life (4, 5). Therefore, glycosylation effectively increases the diversity of the proteome by enriching the library of protein features prescribed by more than 20 classic amino acids. For example, there is increasing evidence that the immune system diversifies the antigen-specific library by specifically targeting the antigen binding sites of immunoglobulins (IgG) with post-translational modifications (especially N-linked glycosylation) (6 ). In addition, the profound impact of glycans on proteins has prompted extensive glycoengineering efforts to rationally manipulate key glycosylation parameters (for example, glycan size and structural composition, and glycosylation location and occupancy) as a way to optimize a range of different industrial and therapeutic applications Means of protein characterization. (7⇓ ⇓ –10).

Despite some remarkable successes, currently we cannot predict which sites in the protein are glycosylated and how the glycosylation of the allowed sites will affect the structure and function of the protein. Therefore, the routine use of glycosylation is advantageous. Strategies for engineered proteins with characteristics are currently limited. In fact, a deeper understanding of design rules (that is, how glycans affect the biological and biophysical properties of proteins) represents a huge challenge in the field of glycoprotein engineering. For this reason, computational methods can explore the influence of glycosylation on protein folding and stability on the computer (11, 12); however, these involve the trade-off between molecular details and glycoprotein size, and all-atom molecular dynamics simulations usually only Limited to short glycopeptides or protein domains (11). Ideally, to explore the consequences of glycosylation through experiments requires access to a large number of chemically defined glycoproteins that are sufficient for characterization (13). Mammalian cells represent an obvious choice to obtain proteins with natural and naive sugar sites. However, due to the time-consuming and low-throughput characteristics of gene transfection and culture of mammalian cells, studies using mammalian cell-based expression systems usually only study a few designs (about 15 or less) (14⇓ ⇓ – 17), but there are few exceptions, such as the touring study of Elliott et al. (18). In addition, in mammalian expression systems, the inherent variability (micro-heterogeneity) of the glycan structure at a given site may be unpredictable and difficult to control. Another option is chemical synthesis, which can provide glycopeptides with uniform structures for studying the local effects of N-linked glycans on peptide conformation (19). Although the full chemical synthesis of full-length proteins is still challenging, advances in expressed protein linkage (EPL) have opened the door to the polymer assembly of chemically synthesized glycopeptides and recombinantly expressed protein domains to form larger glycoproteins. These glycoproteins carry complex N-glycans installed at discrete sites. 20, 21). Using this technology, Imperiali and colleagues created a group of seven site-specific glycosylation variants of the bacterial immune protein Im7, which use the disaccharide N,N'-diacetylchitobiose (GlcNAc2) was modified, and the kinetic and thermodynamic results of the installation of glycans at the specified positions were evaluated. twenty two). Unfortunately, EPL is a very technically demanding procedure that requires manual construction of each individual glycoprotein, which effectively limits the number of testable sugar site designs to a small number.

In order to go beyond these "one sugar site at a time" methods of providing glycoproteins, we describe in this article a scalable technology called shotgun scanning sugar mutation (SSGM), which involves the design and construction of combinatorial receptor protein libraries , Where 1) the library carries a single N-sugar site "mutation" that is introduced at a defined position along the protein backbone, 2) the complete set of glycan receptor sites (sequences) in the library effectively covers the target protein (Figure 1). The resulting SSGM library is expressed using N-glycosylation competent bacteria in the context of glycoSNAP (glycosylation of secreted N-linked receptor protein), which is a glycosylation-based Multi-functional high-throughput screening of extracellular secretion of chemical proteins (23). Using this glycoprotein engineering tool, we constructed and screened an SSGM library corresponding to three model proteins: bacterial immune protein Im7, bovine pancreatic ribonuclease A (RNase A) and a human single-chain variable fragment antibody specific for HER2 (scFv-HER2). Our results show that the installation of N-glycans can be tolerated in a large number of positions and all types of secondary structures, with relatively high N-glycosylation efficiency in most cases. For many of these glycoproteins, the presence of N-glycans at the initial site has a measurable effect on protein stability and/or activity, and its effect depends on the precise location of the modification. In summary, these findings prove that the SSGM method can generate a large number of discretely modified new glycoproteins. These new glycoproteins jointly reveal glycosylation sites and provide a site-specific N-glycan installation pair structure and/ Or insights into the impact of functional characteristics.

Construction of new glycoproteins by SSGM. Schematic diagram of SSGM, a glycoprotein engineering method based on combinatorial protein library, in which glycosylation "sequon walking" is used to introduce receptor sites at every possible position of the protein backbone. Please note that the multi-residue nature of the sequence (for example, NXS/T or D/E-X1-N-X2-S/T, where X, X1, X2 ≠ P) requires the insertion or replacement of up to five additional amino acids Replace in each position. The generated library is expressed in glycoengineered bacteria, providing each library member with an opportunity for expression and glycosylation. This method is compatible with high-throughput screening through glycoSNAP to query the glycosylation table of a single variant type. By combining the expressed SSGM library with multiplex analysis, the biochemical and biophysical properties of each new glycoprotein can be interrogated individually. Depicted is an engineered Campylobacter jejuni GalNAc5(Glc)GlcNAc heptasaccharide with a reducing end GlcNAc (blue square), followed by five GalNAc residues (yellow square) and a branched glucose (blue circle). The structure drawn according to the symbolic nomenclature of glycans (SNFG; https://www.ncbi.nlm.nih.gov/glycans/snfg.html).

In order to be able to screen the SSGM library, we first tried to screen the protein of interest (POI) using glycoSNAP. In the initial glycoSNAP test, we genetically modified E. coli YebF, a small (mature form of 10 kDa) extracellular secreted protein (24) with artificial glycosides (for example, NXS/T or D/ E-X1-N-X2 -S/T, where X, X1, X2 ≠ P) are at its C terminal. The modified YebF protein is expressed in E. coli cells carrying the N-glycosylation mechanism of Campylobacter jejuni (25), which is combined with the nitrocellulose membrane. After being secreted from the filter-bound colony, the presumably glycosylated YebF is captured on a second nitrocellulose membrane, which is probed with antibodies or lectins to detect N-linked glycans. In this way, glycoSNAP creates a convenient genotype-sugar phenotype connection for easy scoring (glycosylated and non-glycosylated) of YebF protein secreted from a single bacterial colony (Figure 2A). Here, we hypothesize that the genetic fusion of sugar site-modified POI and YebF will cause extracellular secretion of the fusion protein, so that the glycans mounted on the POI can be detected by a nitrocellulose membrane-based screening strategy. To test this hypothesis, we initially used E. coli Im7 as the POI for the following reasons: 1) It is a small globular protein with 87 residues, lacks disulfide bonds, and expresses well in the periplasm where bacterial N-glycosylation occurs. (25); 2) Although it is not a natural glycoprotein, Im7 with a DQNAT glycosylation tag at its C-terminal can be glycosylated by the N-glycosylation mechanism of Campylobacter jejuni in Escherichia coli (25); 3) The crystal structure can be used for wild-type (wt) Im7 (26) and the complex of Im7 and its homologous toxin Colicin E7 (ColE7) (27); and 4) a limited set of seven Im7 variants was previously generated to Determining the effect of GlcNAc2 attachment on folding and stability (22) provides some useful reference points for comparison.

Construction and query of SSGM library. (A) Schematic diagram of constructing SSGM library using multiple inverse PCR. The resulting DNA plasmid library encodes new glycoprotein variants with sugar site substitutions at every possible position for co-transformation of E. coli strain CLM24 and two additional necessary N-glycosylation mechanisms from Campylobacter jejuni Plasmid. The resulting bacterial library was spread on solid agar, and then the colonies and their secreted glycoproteins were replicated and spread on nitrocellulose membranes, as described in the text. (B) Western blot analysis of receptor proteins in colony secretions (left) and extracellular supernatant fractions (right) from E. coli CLM24, which carries plasmids encoding YebF-Im7 or YebF-Im7DQNAT and N -Glycosylation mechanism plasmid CjPglB or inactive mutant (mut). (C) Same as B, but contains YebF-RNase A and YebF-RNase AN34 in colony secretions (left) and periplasmic part (right). The blot was probed with anti-polyhistidine antibody (α-His) to detect receptor proteins and SBA or hR6 serum to detect glycans. The bottom color panels in B and C depict the superposition of α-His and SBA blots (merged). The arrows indicate the aglycosylated (g0) and monoglycosylated (g1) forms of YebF-Im7DQNAT or YebF-RNase AN34. The molecular weight (MW) marker is shown on the left. The results represent at least three biological replicates.

To determine whether Im7 is compatible with the glycoSNAP program, E. coli strain CLM24 was transformed with a plasmid encoding YebF-Im7, which was modified with a DQNAT glycosylation tag (28) at the C-terminus of Im7 and two additional plasmids, one encoding Glycosyltransferase (GT) enzyme is used for the biosynthesis of N-glycans, and another encoding oligosaccharyltransferase (OST) is used to transfer the produced N-glycans to the receptor protein. In order to minimize micro-heterogeneity so that the modified receptor proteins all carry the same glycans, we have created a system for the production of homogenous N-glycans with the structure of GalNAc5(Glc)GlcNAc-Asn (Figure 1) This is one of several structurally related glycan donors, which can be efficiently transferred to the target protein in E. coli by Campylobacter jejuni OST PglB (CjPglB) (29, 30). Although the biotechnological value of this glycan is questionable, it provides an excellent model for our proof-of-concept SSGM research for a variety of reasons. First, it involves the formation of the key GlcNAc-Asn bond, which is the same as found in the prototype eukaryotic N-glycan. Secondly, it may be transformed into a complex eukaryotic glycan through a two-step digestion/transglycosylation process (31). Third, its structural uniformity and relative abundance when heterologously produced in E. coli cells, as well as its compatibility with PglB, all help to ensure that the difference in glycosylation efficiency is minimally affected by substrate-related factors. It is attributed to a given receptor site.

When inoculated on solid agar and subjected to colony blotting, cells expressing YebF-Im7DQNAT or a control YebF-Im7 construct lacking a glycosylation tag are able to secrete the fusion into the extracellular medium, which can be achieved by the membrane The anti-His antibody (Figure 2B). However, only strains expressing YebF-Im7DQNAT in the presence of wild-type CjPglB, and not the CjPglB variant (23), which is inactivated by two active site mutations (D54N and E316Q), produced the same strain as soybean lectin (SBA). The reacting colony (Figure 2B), a lectin that binds to the terminal GalNAc residue in the N-glycan of Campylobacter jejuni (29). The results of colony blotting were confirmed by western blot analysis of the culture supernatant, which indicated that both YebF-Im7 and YebF-Im7DQNAT were secreted into the extracellular medium, but only the latter was glycosylated, such as the appearance of higher molecular weight bands. This was demonstrated in the blot probed with glycan-specific antiserum (Figure 2B). As expected, no glycan-specific signal was detected in colony blots or immunoblots corresponding to cells carrying the mutant CjPglB enzyme (Figure 2B). Importantly, the main glycan linked to YebF-Im7DQNAT corresponds to GalNAc5(Glc)GlcNAc, which was confirmed by mass spectrometry to represent more than 98% of all detected glycoforms (SI appendix, Figure 1). Overall, these results confirm the compatibility of bacterial Im7 with our glycosylation workflow, resulting in a homogeneously modified receptor protein that is easily detected by glycoSNAP screening.

Next, the plasmid encoding YebF-Im7 was mutagenized to create a library of Im7 gene sequences, each with a single sequence substitution and cumulatively covering all positions in the Im7 protein. Mutagenesis was performed using multiple inverse PCR (32) and a set of different contiguous primers designed to amplify the entire plasmid and introduce an acceptor asparagine residue (two upstream and two Four downstream residues were changed to DQ and AT), resulting in a highly concentrated plasmid library, which is rich in in-frame clones, each with a DQNAT receptor motif at a defined position (Figure 2A). In fact, next-generation sequencing of the preselected plasmid library confirmed the complete sequence coverage of all sugar site positions in Im7, with >103 reads detected in all positions except for one position (SI appendix, Figure 2). In the case that all glycosite variants are present and considered, the resulting plasmid library is introduced into the CLM24 strain carrying the necessary N-glycosylation mechanism, and the cells transformed from the library are plated on solid agar and subjected to glycoSNAP filter. From one membrane, we detected a total of about 200 glycosylation-positive colonies, 20 of which were randomly selected for further analysis. Sequencing confirmed the presence of a single intra-frame DQNAT motif in each isolated hit, and the Im7N37 and Im7N58 variants (where the superscript indicates the position of the asparagine residue) appeared 3 and 2 times, respectively (Figure 3A). Hits are fairly evenly distributed throughout the Im7 sequence, and are located in each type of secondary structure, including bends, turns, and α-helices. This is consistent with X-ray crystallographic data, indicating that occupied glycosylation sites can appear On all secondary structure elements (33). Western blot analysis confirmed that each selected clone was effectively glycosylated (Figure 3B).

Construction and characterization of bacterial Im7 novel glycoprotein library. (A) Primary sequence and predicted secondary structure of Escherichia coli Im7 immune protein. The asterisk indicates the location and frequency of glycoside hits separated using SSGM. The predicted structure is adapted from the Protein Database (PDB) ID code 1AYI. (B) Western blot analysis of the supernatant fraction from CLM24 cells carrying the YebF-Im7 fusion plasmid, with sequon mutations at the designated positions and necessary N-glycosylation mechanisms. Probe the blot with anti-polyhistidine antibody (α-His) to detect receptor protein (top) and hR6 serum against glycan (bottom). The labels for the aglycosylated (g0) and monoglycosylated (g1) forms of the receptor protein are shown on the right. The molecular weight (MW) marker is shown on the left. Asterisks indicate constructs with mutations. A stop codon is introduced before the 6xHis tag to prevent α-His detection. The results represent at least three biological replicates. (C) Cell-based glycosylation efficiency is mapped to the three-dimensional structure of Im7 complexed with ColE7 (left). The heat map analysis of glycosylation efficiency is determined based on the optical density quantification of the glycosylation percentage (defined as the g1/[g0 g1] ratio) of each receptor protein in the anti-histidine immunoblot. The detailed interaction between ColE7 and Im7 highlights the side chain of Im7 in the regions of α1-loop12-α2 (residues 19 to 39; middle) and loop23-α3-loop34 (residues 46 to 63; right). Heat map analysis of changes in binding activity is determined by normalized activity measured on glycosylated sequence variants of non-glycosylated counterparts. (D) The binding activity of glycosylated (gray bars) and aglycosylated (white bars) YebF-Im7 variants recovered from the supernatant was measured by ELISA with ColE7 as the immobilized antigen. All data were normalized to the binding activity measured for the non-glycosylated YebF-Im7 (wt) lacking sequence, so that a value greater than 1 (indicated by the red dashed line) indicates that the binding activity is enhanced relative to wt Im7. The dashed box corresponds to two regions (region 1: residues 23 to 33; region 2: residues 58 to 69), which have many variants with increased activity. The data is the average of three biological replicates, and the error bars represent the SD of the average. (E) DSF analysis of the 15 most active YebF-Im7 variants with and without glycosylation. Tm is calculated as the midpoint of the thermal transition between the original state and the unfolded state. The dotted line represents the Tm of wt YebF-Im7 (38.6 ± 1.0 °C). The black bars are the average of three independent replicates, and the error bars are reported as SEM. The red dotted lines in D and E indicate the activity and Tm of wt YebF-RNase A. Use two-way analysis of variance to perform statistical analysis on all data in D and E, and the significance is as follows: *P <0.1; ** P <0.01; *** P <0.001; ****P <0.0001; none Mark, not significant.

In order to thoroughly explore the glycosylation sequence space, we used multiple PCR primer pairs to introduce DQNAT sequences at each position of the protein, and constructed all possible single Im7 sequence variants (80 in total). It was found that a large number of these variants (78 out of 80) were glycosylated. According to the densitometry of anti-histidine blotting, the efficiency of many variants reached or approached 100% (Figure 3C and SI appendix, image 3). 3). Because the glycosylation of CjPglB can occur before and after the completion of protein folding (SI Appendix, Figure 4) (34, 35), the secondary and tertiary structure around the glycosylation site may have a direct impact on the degree of protein folding . The given website is occupied. In fact, it has been observed that the sequences located in the structurally defined regions of folded receptor proteins are poorly glycosylated, and partial unfolding is required to improve the glycosylation efficiency at these sites (35, 36). To determine whether the structural background of any Im7 sequence variant is the determinant of glycosylation time and efficiency, we performed an in vitro cell-free glycosylation reaction, in which the folded but not yet glycosylated YebF-Im7 protein was derived from culture The supernatant was incubated with purified CjPglB and glycan donor. It is worth noting that the results of cell-free and cell-based glycosylation are almost identical. Almost all purified Im7 variants have undergone 100% or close to 100% high-efficiency glycosylation, with few exceptions (SI Appendix, Figure 3). The observation that so many Im7 variants are effectively glycosylated by CjPglB in vitro (ie, after the folding is completed) indicates that each sequence is located in a structural compliance position within the fold (for example, flexible and surface-exposed loops) protein Or a protein region that is partially unfolded during a cell-free glycosylation reaction. Although considering the small size and simple topology of Im7, the wide accessibility is certainly reasonable, but we cannot rule out the contribution of the conformational instability effect caused by replacing the five-residue stretch of natural amino acids in proteins. Regardless of the exact cause, these results indicate that Im7 is extremely tolerant to both cell-based and cell-free installation of N-glycans on its entire structure.

In order to thoroughly determine the effect of glycan attachment on the properties of new glycoproteins, we first performed a porous enzyme-linked immunosorbent assay (ELISA) on each one using purified ColE7 as the immobilized antigen to quantify all 80 glycosylated and non-glycosylated ones. Binding activity of Im7 sequence variants. Native Im7 interacts with ColE7, which is a 60-kDa bacterial toxin that is cytotoxic in the absence of a homologous Im7 inhibitor (37). Focusing on the multiplicity, we chose to directly detect the YebF-Im7 fusion because 1) it does not require molecular reformatting of the expression construct, and 2) the fusion can be separated into relatively pure species from the cell-free supernatant, bypassing It needs to be used for extensive purification, and 3) the introduction of the small YebF domain has no measurable effect on the binding activity of ColE7 (SI appendix, Figure 5A). Although nearly two-thirds of YebF-Im7 fusions are either unaffected by glycosylation, or become inactive by introducing the DQNAT motif alone, especially in the continuous stretch between residues 50 and 57 of Im7 , The remaining third showed significantly altered binding activity, which was attributed to the presence of N-glycans (Figure 3D). The effect of these glycosylation induction obviously depends on the precise location of the modification. In fact, the most significant increase in the binding activity of glycosylated variants relative to some of their non-glycosylated counterparts is observed at the transition between different types of secondary structures (eg, Im7N33, Im7N58, and Im7N65). These results are particularly noteworthy in view of the increased likelihood of finding naturally occurring sequences where changes in secondary structure occur (33).

Among the Im7 neoglycoproteins whose activity is most significantly affected by the positive and negative effects of N-glycosylation, most are located in two different regions covering residues 23 to 33 and 58 to 69 (Figure 3D). These regions occur in the two arms of Im7 (one is located in α1-loop12-α2 from residues 19 to 39, and the other is located in loop23-α3-loop34 from residues 46 to 63), and they are the same as those in ColE7 in the crystal. Extensive interaction structure in continuous regions (Figure 3C) (27). The two interfaces are charge-complementary, and the charge interaction is mainly responsible for the tight and specific binding between the two proteins; therefore, it is not surprising that the binding activity is sensitive to N-glycan attachment near these interfaces. It should be noted that the presence of N-glycans in some of these positions has a unique regulatory effect, because the replacement of DQNAT alone at these same positions usually has little effect on activity, as measured for non-glycosylated Im7 variants. ColE7 combined with wt Im7 as demonstrated (SI appendix, Figure 5B).

To determine whether any increase in glycosylation-induced binding activity is related to the stabilization of natural folding, the most active Im7 neoglycoprotein was subjected to differential scanning fluorescence (DSF) in a real-time PCR instrument using SYPRO Orange dye. Previous studies have shown that the melting temperature (Tm) value obtained by DSF has a good correlation with the value measured by the circular dichroism (CD) thermal denaturation (38). Here, we have also observed excellent agreement between the two methods. Both methods produced a Tm value of wt Im7 (~39°C; SI ​​appendix, Figure 5C and D), which is consistent with the previously reported The value (37) is consistent. Importantly, the presence of the small YebF domain did not significantly change the Tm value of Im7 (SI appendix, Figure 5D), which is consistent with its lack of influence on the binding activity of ColE7. We also confirmed that the DSF results obtained using YebF-Im7 directly from the cell-free supernatant are indistinguishable from those obtained using the more extensively purified YebF-Im7 (SI appendix, Figure 5D). Using DSF, the average Tm value of the glycosylated and non-glycosylated versions of each Im7 variant was measured, and the change in folding temperature ΔTm was calculated. Therefore, a positive ΔTm indicates an increase in structural order and a decrease in conformational flexibility. Because of the addition of glycans. Several variants showed positive ΔTm values, with the largest increase corresponding to glycan installation at N33, N59, N60, N65, and N80 (Figure 3E). In contrast, the glycans at N10, N58, and N64 caused the greatest decrease in Tm, indicating that the protein structure changes induced by glycans made the protein unstable.

We next turn our attention to RNase A. Like Im7, RNase A has been studied in depth from a structure-function perspective and is essential for understanding many aspects of enzymology, biochemistry, and protein folding and stability. We chose RNase A because 1) It is a relatively small protein containing 124 residues, but has a more complex topology than Im7, with all major secondary structure types, namely α-helix, β- Folding and turning; 2) The natural glycosylation form of RNase A, namely RNase B, contains a single N-linked oligosaccharide at N34 and has a crystal structure (39); 3) The glycosylation pair of N34 is secondary or tertiary The hierarchical structure has no significant effect (39), but it does seem to change the thermal stability (40), although this is controversial (41); and 4) Use the best bacteria at the natural N34 glycosylation site (RNase AN34) Sequence-modified RNase A can be glycosylated by CjPglB in cell-based and cell-free reactions (34, 35). For these reasons, RNase A is an ideal target for SSGM.

The extracellular secretion of glycosylated YebF-RNase AN34 was observed in colony blotting and immunoblotting (Figure 2C), confirming the compatibility of RNase A and glycoSNAP screening. The SSGM library was created by a multiplex reverse PCR method on YebF-RNase A plasmid DNA. As a result, the sequence coverage in the preselected library was 93%, which was determined by next-generation sequencing (SI appendix, Figure 2). CLM24 cells carrying the plasmid encoding the essential glycosylation mechanism of Campylobacter jejuni were transformed with the SSGM library and subjected to glycoSNAP screening. A total of about 100 glycosylation-positive colonies were randomly selected from the two membranes and sequenced and analyzed. Of these, only 50 were non-redundant because many sequences were separated multiple times (for example, RNase AN41 and RNase AN122 were separated 7 times each; Figure 4A). The sequences of these positive hits are evenly distributed throughout the primary sequence, and are found in each type of secondary structure element, similar to the result of Im7. Western blot analysis confirmed that all selected clones were glycosylated, and according to the optical density analysis of anti-histidine blotting, the efficiency of most clones reached or approached 100% (Figure 4B and SI appendix, Figure 6A and B) ). We also used NetNGlyc1.0 (www.cbs.dtu.dk/services/NetNGlyc/) to perform theoretical analysis on each variant of these RNase A glycosylation sites. NetNGlyc1.0 is a web-based tool, Predictable human N-glycosylation site proteins use artificial neural networks to examine the sequence context of NXS/T sequences (40). Interestingly, there are a total of 18 sugar sites, mainly concentrated in the C-terminal half of the protein, and the glycosylation probability score is less than 50% (SI Appendix, Figure 6C), so the glycosylation efficiency is expected to be low, if at all. Both RNase AN111 and RNase AN122 scored below 30%, but both are glycosylated very efficiently in cells (and in vitro, as described below).

Construction and characterization of RNase A new glycoprotein library. (A) The primary sequence and predicted secondary structure of bovine pancreatic ribonuclease A. The asterisk indicates the location and frequency of sugar site hits separated using SSGM. The prediction structure is adapted from PDB ID code 1RBX. (B) Map the cell-based (left) and cell-free (right) glycosylation efficiency to the three-dimensional structure of RNase A. Based on the percentage of glycosylation (defined as the g1/[g0 g1] ratio) for each new glycoprotein in the anti-histidine immunoblot. (C) Enzyme activity of glycosylated (grey bars) and non-glycosylated (white bars) RNase A variants recovered from the culture supernatant. All data were normalized to the binding activity measured for aglycosylated YebF-RNase A lacking sequon (wt). The data is the average of three biological replicates, and the error bars represent the SD of the average. (D) DSF analysis of YebF-RNase A variants with and without glycosylation. Tm is calculated as the midpoint of the thermal transition between the native and unfolded state. The dotted line represents the Tm of wt YebF-RNase A (59.0 ± 0.1 °C). The black bars are the average of three independent replicates, and the error bars are reported as SEM. The red dotted lines in C and D indicate the activity and Tm of wt YebF-RNase A. Use two-way analysis of variance to perform statistical analysis on all data in C and D, and the significance is as follows: *P <0.1; ** P <0.01; *** P <0.001; ****P <0.0001; none Mark, not significant.

In order to investigate whether the structural background of the sequence affects the possible timing of PglB-mediated glycan installation, we performed cell-free glycosylation on the folded RNase A variant. Although some variants are equally effective in glycosylation in cell-based and cell-free reactions (for example, RNase AN46 and RNase AN64), unexpectedly, a large number of variants show significantly lower effects in cell-free conditions Glycosylation level (Figure 4B and SI appendix, Figure 6 A and B). The most notable of these are the variants N34, N35, N36, N43, N51, N61, N69, N72, N80, N89, and N104, all of which are effectively glycosylated in cells, but have little or no detectability in vitro To the glycosylation. When the protein is unfolded, these sequences appear in positions that the OST may be accessible during translation/translocation, but become inaccessible after the protein is folded. In fact, the natural N-glycosylation site at N34 is located in the domain, which indicates that the poor cell-free glycosylation at this specific position (possibly also near the N36 and N43 sites) is due to the folded state Sequences are not accessible. This folding-dependent recognition of this site has been observed previously (34, 35) and, together with the results presented here, supports a model in which cell-based glycosylation of these specific sequences involves Glycan installation before folding, co-translocation or after translocation (SI appendix, Figure 4).

To determine the results of glycosylation at 50 unique sites, the glycosylated and non-glycosylated versions of each sequon variant were evaluated for their ability to catalyze the hydrolysis of phosphodiester bonds in RNA. Although the addition of YebF has little effect on RNase A activity (SI appendix, Figure 7a), more than half of RNase A variants are inactivated by the substitution of the DQNAT sequence (Figure 4C). In order to determine whether this may be due to the substitution of five residues in the target protein, which is the requirement for CjPglB for optimal recognition (42), we mutated RNase A more conservatively at the selected number of sites. Specifically, we generated the minimal sequence (DXNXT/S or XXNXT/S, where X represents a natural amino acid), and in most cases only one or two amino acid changes are required. Except for the RNase AN55 with DVNAT sequence, each of these mutants is completely inactive. It retains some activity, but it is still significantly lower than that of wt enzyme (SI appendix, Figure 7B). Therefore, even if relatively small sequence perturbations occur at these positions, in addition to less subtle substitutions with DQNAT, RNase A can be inactivated. A closer examination revealed that most variants with little activity correspond to sequence substitutions in the sequence. It is expected to destroy the location of catalytically important residues or disulfide bonds (Figure 4C and SI appendix, supplementary results).

Among the functionally retained RNase A new glycoproteins, only 8 (N34, N35, N36, N51, N53, N61, N89 and N104 sequences) showed activity equivalent to wt RNase A (>50%), but none One is more active than their aglycosylated counterparts (Figure 4C). In the case of RNase AN119, the introduction of the DQNAT sequence completely eliminated the catalytic activity, which is consistent with the previous finding that the relative activity of the H119N mutant was reduced to less than 1% of wt RNase A, and the catalytic efficiency was reduced by 100 to 1,000-folded Depends on the substrate used (43). Although this residue is important for catalysis, the glycosylation at this position partially restores enzymatic activity, indicating that N-glycan-dependent function acquisition.

To determine whether glycosylation affects stability, we again used DSF to analyze the most active RNase A new glycoprotein and randomly selected RNase AN93 as representative inactive variants. The measured Tm values ​​of wt YebF-RNase A and its unfused counterparts are both about 59 °C (SI Appendix, Figure 7C), which is very consistent with the previous findings (41). The Tm values ​​of all YebF-RNase A variants span The range is from 58 to 63 °C (Figure 4D). Compared with their non-glycosylated counterparts (including RNase AN119 variants), most exhibited positive ΔTm values, indicating that the restoration of activity caused by glycan attachment at N119 also helps stabilize the protein. In contrast, RNase AN89 and RNase AN93 showed large negative ΔTm values, which coincided with the slight decrease in activity due to glycan attachment in N89 and the complete inactivation in N93.

We next studied the glycosylation of antibody variable domains, which is observed in about 15% of serum IgG and contributes to the diversification of the B cell antibody library (6). Although the installation of glycans in the variable domains of the Fab arms has been known for a long time, the rules for the selection of N-glycosylation sites in the Fab domain during somatic hypermutation and the functional consequences of attached glycans are still poorly understood. . In order to use SSGM to systematically study this phenomenon, the two variable domains VH and VL from the human anti-HER2 monoclonal antibody are connected by a flexible linker to form scFv-HER2, which is then modified with YebF at its N-terminus and at its C-terminus With DQNAT motif. The extracellular secretion of glycosylated YebF-scFv-HER2DQNAT was observed in colony blot and immunoblotting (Figure 5A), confirming the compatibility of scFv-HER2 with glycoSNAP screening. Since variable domain glycosylation is affected by selection mechanisms that depend on the nature of the antigen (6), we modified the SSGM strategy by using SBA lectin and the extracellular domain (residues 1 to 652) of human HER2 (HER2-ED) , It is enthusiastically combined with scFv-HER2DQNAT integrated with YebF (SI appendix, Figure 8A). In this way, two-color screening can be used to identify colonies that are positive for glycosylation and antigen binding, as demonstrated by the YebF-scFv-HER2DQNAT construct (Figure 5A). Next, we constructed and screened an SSGM library, and then used CLM24 cells carrying a plasmid encoding the library and the glycosylation mechanism of Campylobacter jejuni for two-color glycoSNAP screening. A total of about 60 double positive hits were isolated from the membrane, of which 21 were determined to be non-redundant (for example, N58 in VL and N42 in VH were separated 12 times each) (Figure 5B), and then confirmed glycosylated The degree was analyzed by Western blot and optical density (SI appendix, Figure 8 B and C). These hit sequences are sparsely distributed in the entire primary sequence, and most of them are gathered after the second and third complementarity determining regions (CDR) of the VL domain and in flexible linkers, indicating that there is a clear resistance to selection bias at specific sites. Glycosylation does not interfere with the binding function. Interestingly, some of the identified sequences appear in CDR2 of the VL domain and CDR1 and CDR2 of the VH domain, which are consistent with the naturally occurring IgG library, where N-sugar sites preferentially appear in the CDR (6).

Construction and characterization of scFv-HER2 novel glycoprotein library. (A) Western blot analysis of colony secretions from Escherichia coli CLM24 carrying plasmid encoding scFv-HER2DQNAT and the necessary N-glycosylation mechanism (left and middle) and receptor protein in the periplasmic part (right)). Use anti-polyhistidine antibody (α-His) to detect the blot to detect receptor proteins, use SBA or hR6 serum to detect glycans, and use HER2-ED to detect antibody binding. The bottom color panel depicts the overlay of α-His and SBA blots or SBA and HER2 blots (combined). The arrows indicate the aglycosylated (g0) and monoglycosylated (g1) forms of scFv-HER2DQNAT. The molecular weight (MW) marker is shown on the left. The results represent at least three biological replicates. (B) Frequency and location of N-glycosylation sites in scFv-HER2DQNAT sugar variants isolated using SSGM. (C) The binding activity of glycosylated (grey bars) and non-glycosylated (white bars) scFv-HER2DQNAT variants, as measured by ELISA with HER2-ED as the immobilized antigen. All data were normalized to the binding activity measured for non-glycosylated scFv-HER2 (wt) lacking sequence, so that a value greater than 1 (indicated by the dashed line) indicates enhanced binding activity relative to wt scFv-HER2. The data is the average of three biological replicates, and the error bars represent the SD of the average. (D) DSF analysis of YebF-scFv-HER2 variants with and without glycosylation. Tm is calculated as the midpoint of the thermal transition between the native and unfolded state. The dotted line represents the Tm of wt YebF-scFv-HER2 (68.2 ± 0.1 °C). The black bars are the average of three independent replicates, and the error bars are reported as SEM. The red dotted lines in C and D indicate the activity and Tm of wt YebF-RNase A. Use two-way analysis of variance to perform statistical analysis on all data in C and D, and the significance is as follows: *P <0.1; ** P <0.01; *** P <0.001; ****P <0.0001; none Mark, not significant.

In terms of function, all 21 scFv-HER2 hits showed higher than background HER2-ED binding activity (Figure 5C), which is expected because the screening process is suitable to include antigen binding. Importantly, 9 of these new glycoproteins (N58, N64, and N109 in VL; N3, N4, N9, N10 in the linker; N42 and N113 in VH) correspond to their non-glycosylated counterparts. Most of them showed increased binding than their parent scFv-HER2. For the five clones that showed the greatest increase in activity due to glycosylation, we measured the Tm value and found that usually glycan attachment does not affect stability (Figure 5D). However, one exception is N64 VL, whose Tm has increased by 2.6 °C due to the addition of N-glycans. In general, these results are consistent with several previous studies, which show that the contribution of variable region glycans to antibody binding properties and stability depends on the precise location of the glycans (6, 44), and indicates that the region is Glycosylation may be a useful strategy for fine-tuning the performance of IgG antibodies and their engineered derivatives.

To test whether protein structure analysis can explain the observed effects of sequence substitutions and glycosylation, we modeled variants of sequence substitutions, with and without glycosylation, and calculated simple geometric measurements (secondary structure , Burial, distance from the binding site and surface area) and Rosetta energy estimates for each (stability and interface fraction). Unfortunately, for Im7 or RNase A neoglycoprotein, these factors were not found to be significantly related to activity or stability (SI appendix, Figure 9-13 and SI appendix, supplementary results). It should be noted that these indicators may not be very useful for RNase A, because these activities are mainly explained by the destruction of active sites and disulfide bonds, and these indicators are not included in these indicators.

For Im7, we examined the structures of five selected glycosylation variants, experimental N-glycan attachments had different effects on them, and the Rosetta total score changes were related to affinity (the colored dots in the SI appendix, Figure 10F). For the case where glycosylation has no effect on activity (Im7N46), the carbohydrate conformation set does not show interaction with ColE7 protein, while for the case where glycosylation reduces activity (Im7N31), the carbohydrate conformation set is extensive and disordered because The glycans collide with ColE7 residues (high Rosetta energy) (Figure 6A). Then, for the three cases where glycosylation increased activity (Im7N30, Im7N49, and Im7N58), the carbohydrate conformational set had a favorable interaction with ColE7 (Figure 6B).

Computational analysis of new sugar variants. (A and B) A collection of Im7 glycosylation mutants 46, 31, 30, 49 and 58 (agly/ai = 1.04, 0.55, 1.36, 1.91 and 3.44), showing the low-energy conformation of conjugated glycans. N-glycans are shown as lines (oxygen, red; nitrogen, blue; the carbon of each model is shown in a different color). The side chains of Im7 and ColE7 that interact with N-glycans are shown as rods. The structure of the scFv-HER2 VL (red) and VH (blue) domains complexed with the HER2 protein (grey) is shown in the lower left corner. Regression analysis of log activity ratio (glycosylation/wt) and (C) the burying of sequence substitution sites (approximate by the number of Cβ atoms within 8.5 Å per residue), (D) the nearest to the sequence substitution site The distance of HER2 residues, and the total score of (E) Rosetta. In all three panels, the dark red line is the respective regression line. The color of the dot in E shows the corresponding secondary structure of the sequence substitution site. Orange, green, and blue correspond to the α-helix, β-strand, and loop regions, respectively. N58 VL (red circle) has the highest increase in glycosylation binding activity and is discussed in the text. (F) Wt representation of sites used to analyze sequon substitutions (36 VL, 108 VL, and 113 VH) and glycosylation (58 VL). The color of the side chains reflects their respective secondary structure. (G) The glycan arrangement (orange bar) of eight low-energy conformations from the glycosylated N58 VL variant of scFv-HER2 reveals the glycan-HER2 interaction that may lead to increased binding activity.

We also compared the experimental binding activity of scFv-HER2 with multiple geometric and Rosetta indicators. Unlike Im7 or RNase A, scFv-HER2 activity is usually related to many of our indicators. First, sequon burial reduced the binding affinity of scFv-HER2 to its antigen in both glycosylated (R2 = 0.43) and non-glycosylated (R2 = 0.21) states (Figure 6C and SI appendix, Figure 9B, respectively). Similarly, the closer the sequence is to the paratope, the more likely it is that glycosylated (R2 = 0.23) and non-glycosylated (R2 = 0.20) variants will have reduced activity (Figure 6D and SI appendix, Figure 9C, respectively). The buried surface area is also related to the activity of glycosylation variants (R2 = 0.19; SI appendix, Figure 10E). However, the strongest predictor is the Rosetta score. For glycosylation status, activity is related to total Rosetta score (R2 = 0.49; Figure 6E) and interface score (R2 = 0.63; SI appendix, Figure 10G). The total score of the non-glycosylated antibody-antigen complex is correlated with the experimental binding activity (R2 = 0.49; SI appendix, Figure 9F). These Rosetta scores are mainly driven by van der Waals complementarity and to a lesser extent static electricity (SI appendix, Figures 11 and 12).

For non-glycosylation activity, we selected three variants for more in-depth analysis: two variants with low binding activity and poor Rosetta scores (N36 VL, N113 VH; black circles in the SI appendix, Figure 11A ) And a variant with high activity and a favorable Rosetta score (N108 VL; red circle in the SI appendix, Figure 11A). Both the N36 VL and N113 VH sites are located on the β chain of the compact region of the anti-HER2 antibody opposite the antigen binding site (Figure 6F, green bar). The reason for the decrease in stability is the spatial conflict of the replacement sequence in (or near) the tightly packed region of the protein [the spatial conflict of Rosetta terminology (vdW_rep) is 90.2 and 79.8 N36 VL and N113 VH of Rosetta energy unit (REU), respectively] . When glycosylated, the conflict in the Rosetta model worsens, corresponding to low activity (black circle in the SI appendix, Figure 11A). On the other hand, site N108 VL is located at the C-terminus of VH (Figure 6F, blue bar). Sequon substitution has a relatively small effect on electrostatic interaction (-6.2 REU), and a greater effect on the repulsive Van der Waals term (-28.0 REU), indicating that the new side chain is acceptable in a less compact region. Similar results have been reported after substitution mutations in human monoclonal antibodies (45).

In order to understand how N-glycosylation can increase the binding activity of scFv-HER2, we chose the mutant N58 VL because the non-glycosylated variant has 26% higher activity than wt scFv-HER2, and the addition of glycans will increase the binding By 1.8 times. Residue N58 VL is located at the transition between chain 1 and 2 (Figure 6F, blue main chain). In the glycosylation structure generated by Rosetta, the low-energy state shows the interface contact between glycans and HER2 surface residues (Figure 6G), which improves the Rosetta total score and interface score (Figure 6E and the red circle in the SI appendix) , Figure 10G) and explains the increased binding activity due to favorable glycan-antigen contact.

In this study, we developed a protein engineering workflow called SSGM to construct a large library of new glycoproteins for almost all POIs and to characterize the consequences of glycan installation. The use of three receptor proteins with different structures and functions proves the practicability and flexibility of this technology: bacterial Im7, bovine RNase A and human scFv-HER2. Specifically, each of these proteins undergoes a systematic "sequence walking" program, which enables the creation of a synthetic gene library in which N-glycosylation sites (most of which are Initially). After screening these libraries using glycoSNAP (23), it was found that many positions in each protein were effectively N-glycosylated. Although extended regions and loops tend to be more susceptible to glycosylation, it is found that all types of secondary structures are glycosylated, which is consistent with the observation that naturally-occurring N-glycans are also present in all forms of secondary structures Consistent (33). Especially for RNase A, it is predicted that a large number of effective glycosylation sites (18/50) have very low glycosylation potential, highlighting the need for large-scale glycosylation experimental research, such as described here, you can use help Improve forecasting tools. For this reason, high-throughput techniques using mass spectrometry to quantitatively analyze glycosylation efficiency (46, 47) can further improve the method in the future.

The research conducted here also provides insights into the possible timing and effects of glycosylation on the folding process. For example, Im7 can tolerate glycans in almost every position, even if the target asparagine side chain points inward and is considered hidden (for example, positions N7, N68, and N76). Because when the target protein is in the folded state, these buried positions cannot be physically glycosylated by PglB, they must be co-translated or glycosylated in the process of fluctuating to a partially unfolded state, so as to provide information on the site. access. Then, after glycosylation, because Im7 may not be able to fold back into its native structure, it must adopt a different conformation to accommodate the newly added glycans, which will be feasible given Im7 is very flexible (48). In the case of RNase A, several sites (such as N34 and N36) have been identified, which can be effectively glycosylated in cells, but there is almost no glycosylation in vitro (in the folded state), which is folded The pre-glycan installation provides clear evidence and in a manner that may resemble the co-translocation process in mammalian cells (49). Compared with Im7, many RNase A variants have lower overall glycosylation efficiency, which is consistent with the more stable folding structure of the protein and lower access to buried sites.

In addition to discovering glycosylation sites, the SSGM workflow also allows to explore the contribution of these site-directed glycan "mutations" to the biological and biophysical properties of each POI. In this way, SSGM is conceptually similar to combined alanine scanning mutagenesis, which allows the systematic determination of the importance of individual amino acids to protein structure and function (50⇓ –52). Consistent with the known regulatory effects of N-glycans (4, 5), many new glycoprotein derivatives of Im7, RNase A, and scFv-HER2 exhibit detectable changes in stability and activity due to N-polysaccharides. Caused by the covalent attachment of sugars at precise locations. Protein backbone. For example, installing N-glycans in the center of α-helixes will negatively affect activity (for example, positions 19, 42, and 72 in Im7), while those installed in transitions and motifs between different types of secondary structures The rotations promote enhanced activity and stability in some cases (for example, positions 33, 49, 58, 59, 60, 61, 65, 67, 68, 69, 78, and 80 in Im7). These findings generally agree with the folding and stability effects of GlcNAc2 disaccharides attached to discrete positions in Im7 (22), and also provide why natural N-glycosylation sites appear at an increased frequency in turns and bends. The clues, especially when the point of change is in the secondary structure, has a lower frequency in the ordered helix (33). Although it is generally consistent with previous studies, there are still some significant differences. For example, in our hands, Im7 glycosylated at position 27 with GalNAc5(Glc)GlcNAc heptasaccharide is more active but equally stable than its non-glycosylated counterpart, while it is derived from EPL modified with chitobiose at residue 27 Im7 is significantly more stable than unmodified Im7 (note that no activity data is reported) (22). Similarly, RNase AN34 glycosylated with GalNAc5(Glc)GlcNAc showed almost the same activity as non-glycosylated RNase AN34 (and wt RNase A), whereas previously it was observed that the adhesion of oligomannose glycans at N34 was reduced. More than three times the activity (53). The concept that discrete glycan structures connected to the same site in a protein may have different effects is not unprecedented, and other glycoproteins have been documented (54, 55). Therefore, in the future, it will be meaningful to extend SSGM to replace glycan structures, including, for example, Man3GlcNAc2 or other human-like N and O-linked glycans (34, 56, 57) that have been designed in E. coli so that Systematically study the consequences of different glycan structures at discrete locations. In addition, by combining the human-like glycan structure with the glycoSNAP screening tool, we anticipate that an improved version of glycoprotein therapy can be designed using the SSGM workflow described here.

The fact that N-glycan attachment significantly increases the binding activity of several sugar site variants of Im7 and scFv-HER2 suggests that SSGM may become the initial site for adding N-glycans to proteins to adjust their biology And useful tools for biophysical properties. Although glycans are known to increase binding affinity, as far as we know, no mechanism for forming peripheral interface contacts has been proposed. Since SSGM can provide an unprecedented large number of intact new glycoproteins (in this study alone, there are a total of 151), the discovery of these sites has been accelerated, and the impact on the installation of N-glycans can be easily cataloged using protein multiplex analysis. The structure and activities shown here. Although no clear rules regarding the effects of glycosylation are revealed here, we expect that sequence walking on a larger and even proteome range can provide access to data sets that may allow broader promotion and even prediction of sugars. The impact of basicization. Nevertheless, computational analysis shows that the interaction between the glycan and the binding protein can change the binding activity (positive or negative), and the enhanced binding may be due to the favorable interaction between the low-energy glycan conformation and the binding partner. For example, the Im7N58 variant, which has the greatest increase in binding activity during glycosylation, also gains a new contact with its binding partner E7 through glycans, which increases the binding activity by 3.5 times. Similarly, for the scFv-HER2 mutant N58 VL, which exhibits significantly higher antigen-binding activity compared to the parent scFv-HER2, heptameric glycan creates a new contact between scFv-HER2 and HER2-ED, and in combination When buried more surface area. Therefore, even if the partially enhanced binding only comes from sequence substitutions, it may come from the additional contacts of the long Q57 side chain or the stabilization of the CDR L2 loop (residues 51 to 57 in VL) by the sequence, most of the effect comes from the N-poly The sugar itself. Importantly, this observation is consistent with previous findings that glycans attached near (but not inside) the antigen-binding site can increase affinity (58). In summary, our research results show that SSGM can be used to quickly identify the initial site on the protein backbone to strategically place N-glycans, thereby significantly enhancing the biology and/or biology of the resulting new glycoprotein Physical characteristics.

E. coli strain DH5α is used in all molecular biology, including plasmid construction, site-directed mutagenesis and SSGM library construction. BL21(DE3) is used to purify ColE7, and ColE7 is used to measure Im7 binding activity in ELISA format. All glycosylation studies were performed using E. coli strain CLM24 (59), which was originally grown at 37 °C in Luria-Bertani (LB) medium containing the appropriate antibiotic at the following concentration: 20 μg/mL Chloramphenicol (Cm), 100 μg/mL trimethoprim (Tmp) and 50 μg/mL spectinomycin (Spec). When the cells reach mid-log phase, the protein expression is induced by adding 0.1 mM isopropyl-β-d-thiogalactoside (IPTG) and 0.2% (vol/vol) l-arabinose, and then the cells are heated at 30°C Grow for 16 to 20 hours. For all plasmid constructions, please refer to the SI Appendix, Supplementary Methods.

The SSGM mutagenesis library was constructed by multiple inverse PCR (32) followed by T4 ligation. Each pTrc99S-YebF-POI plasmid is used as a template for PCR amplification, using a specially designed primer set so that the DNA sequence 5'-GAT CAG AAT GCG ACC-3' is included at the 5'end of each forward primer , To enable the replacement of five adjacent amino acids with DQNAT. Before PCR, T4 polynucleotide kinase (New England Biolabs) was used to phosphorylate the forward primer to promote T4 ligation. Use Phusion polymerase (New England Biolabs) to perform the PCR reaction, and then gel-purify the PCR products from the product mixture to eliminate non-specific PCR products. The resulting PCR product was self-assembled using T4 ligase (New England Biolabs) to obtain the required SSGM plasmid library, which was then used to transform highly competent DH5α cells, and then the QIAprep Spin Miniprep kit (Qiagen) was used according to the manufacturer’s instructions Perform separation. For next-generation sequencing, please refer to the SI Appendix, Supplementary Methods.

The screening of the SSGM library was carried out using glycoSNAP, as described previously (23). In short, the E. coli strain CLM24 carrying pMW07-pglΔB and pMAF10 was transformed with the corresponding SSGM library plasmid, and the resulting transformants contained 20 μg/mL Cm, 100 μg/mL Tmp and 50 μg/mL Spec at 37 °C. Stay overnight. On the second day, cut the nitrocellulose transfer membrane into a 150 mm plate and pre-wet it with sterile phosphate buffered saline (PBS), then place it to contain 20 μg/mL Cm, 100 μg/mL Tmp, 50 μg/mL The specifications on the LB agar plate, 0.1 mM IPTG and 0.2% (wt/vol) l-arabinose. Copy the library transformants onto a 142 mm nitrocellulose membrane filter (Whatman, 0.45 µm), then place the colony face up on the transfer membrane and incubate at 30 °C for 16 hours. The nitrocellulose transfer membrane was washed in Tris buffered saline (TBS) for 10 minutes, blocked in 5% bovine serum albumin for 30 minutes, and probed with fluorescein-labeled SBA (Vector Laboratories, FL-1011) and Alexa Fluor for 1 hour 647 (AF647) conjugated anti-histidine antibody (R&D Systems, IC0501R) or HER2-ED (R&D Systems, 10126-ER) combined with Alexa Fluor 647 (AF647) (Thermo Fisher Scientific, A37573), follow the manufacturer’s instructions . All positive results were re-streaked onto fresh LB agar plates containing 20 μg/mL Cm, 100 μg/mL Tmp, and 50 μg/mL Spec, and grown overnight at 37 °C. A single colony was grown in liquid culture and DNA sequencing was performed to confirm the location of sugar sites and perform protein glycosylation analysis as described below.

For all methods related to protein separation, western blot analysis, protein activity determination, MS analysis, and ELISA, please refer to the SI Appendix, Supplementary Methods.

The methods for purifying Campylobacter jejuni PglB and isolating LLO from glycoengineered E. coli were previously described (59). In vitro, cell-free glycosylation was performed in a 30 μL reaction containing 20 μL of the supernatant fraction containing non-glycosylated YebF-Im7 or 20 μL of the periplasmic fraction containing YebF-RNase A, 2 μg purified CjPglB and 5 μg of extracted LLO in a cell-free glycosylation buffer [10 mM Hepes, pH 7.5, 10 mM MnCl2 and 0.1% (wt/vol) n-dodecyl-β-d-maltoside]. The reaction mixture was incubated at 30°C for 16 hours, and then 10 µL of 4×Laemmli sample buffer containing 5% β-mercaptoethanol was added, then boiled at 100°C for 15 minutes, and then subjected to Western blot analysis.

The binding activity of Im7 and scFv-HER2 was determined by standard ELISA. In short, Costar 96-well ELISA plate (Corning) used 50 μL of 5 μg/mL purified ColE7 at 4°C for Im7 variant and 50 μL 0.2 μg in 0.05 M sodium carbonate buffer (pH 9.6) /mL HER2 coated-ED (Sino Biological, 10004-HCCH) is used for scFv-HER2 variants in PBS buffer. After blocking with 5% (wt/vol) skimmed milk in PBS for 1 hour at room temperature, the plate was washed 3 times with PBST (PBS and 0.05% [vol/vol] Tween-20) and mixed with serially diluted sugar-free The sylated and glycosylated YebF-Im7 and YebF-scFv-HER2 sugar variants were left at room temperature for 1 hour. After washing 3 times with PBST, 50 μL 1:2,500 diluted HRP conjugated anti-DDDK tag antibody (Abcam, ab49763) for Im7 variant or 50 μL 1:5,000 diluted HRP conjugated anti-6xHis tag antibody (Abcam, ab1187 ) For scFv-HER2 variants, all in 1% PBST, added to each well for 1 hour. The plate was washed 3 times and then developed with 50 µL 1-Step Ultra TMB-ELISA substrate solution (Thermo Fisher).

According to the manufacturer's protocol, the enzyme activity of the RNase A variant was determined using the RNaseAlert-1 kit (Integrated DNA Technologies). Standardize each 80-fold diluted supernatant sample to an optical density at 600 nm equal to the positive control strain expressing wt RNase A. The sample was then mixed with 20 pmol RNase A substrate and 10 µL 10×RNaseAlert buffer and incubated in an RNase-free black 96-well microplate (Fisher) at 37°C for 30 minutes. Measure the fluorescence value at 490 nm/520 nm excitation/emission wavelength.

The far ultraviolet (UV) CD spectrum of purified Im7 (50 mM sodium phosphate and 400 mM sodium sulfate, pH 7.4) as a function of temperature was performed in a 0.1 cm cuvette on a spectropolarimeter. The far-ultraviolet CD spectrum was obtained between 200 nm and 260 nm, with a step resolution of 1 nm. As mentioned earlier, high-throughput DSF was used to determine the melting temperature of purified sugar variants (60). In short, according to the manufacturer's instructions, mix 5 to 10 µg of protein with the protein heat shift buffer and protein heat shift dye purchased as a protein heat shift dye kit (Thermo Fischer Scientific). By increasing the temperature from 10°C to 90°C on the Applied Biosystem ViiA 7 instrument (Life Technologies) at a rate of 0.06°C/s, while monitoring the fluorescence at 465 nm/610 nm excitation/emission wavelengths, melting is generated curve. To calculate the Tm value, use the Boltzmann equation in Prism 8.4.2 (GraphPad) to analyze the collected data through nonlinear regression analysis.

For all computational analysis, including protein structure preparation, geometric calculations and Rosetta protocol, please refer to the SI Appendix, Supplementary Methods.

The Python script has been stored on GitHub (https://github.com/tdm76/Li_PNAS_2021). All other research data is included in the article and/or SI appendix.

We thank Markus Aebi for providing the strains CLM24 and hR6 serum used in this work. We thank the Cornell Institute of Biotechnology's Biotechnology Resource Center Genomics and Bioinformatics Core Facilities for their assistance in sequencing experiments. We would also like to thank Mike Jewett, Milan Mrksich, Eric Sundberg, José-Marc Techner, Weston Kightlinger, Liang Lin, Jessica Stark, and Sai Pooja Mahajan for helpful discussions on the manuscript. This work was supported by the Defense Threat Reduction Agency (HDTRA1-15-10052 and HDTRA1-20-10004 to MPD), NSF (CBET-1159581, CBET-1264701 and CBET-1936823 to MPD) and NIH (1R3011GM to MP3, 1R01GM127578 to MPD) MPD and JJG, and 1S10 OD017992-01 to SZ). This work was also supported by NIH-funded Cornell Cancer Metabolism Physics Center (support grant 1U54CA210184) seed project funding (to MPD). TDM received training grants from NIH National Institute of Biomedical Imaging and Bioengineering (T32EB023860) and Cornell Fleming Graduate Scholarship. TJ is supported by the Royal Thai Government Scholarship and Cornell Fleming Graduate Scholarship.

↵1M.L. and XZ made the same contribution to this work.

Author contribution: ML, XZ, SS, TJ, JJG and MPD design research; ML, XZ, SS, TJ, TDM, IK, JB, ECC and QF research; ML, XZ, SS, TJ, TDM, SWH , IK, ECC, QF, SZ, JWL, JJG and MPD analysis data; ML, XZ, SS, TJ, SWH, SZ, JWL, JJG and MPD wrote this paper.

Competitive Interest Statement: MPD has financial interests in Glycobia, Inc., SwiftScale Biologics, Inc. and Versatope Therapeutics, Inc. The interests of MPD are reviewed and managed by Cornell University in accordance with its conflict of interest policy. JJG is an unpaid board member of Rosetta Commons. According to the Institutional Participation Agreement signed by the University of Washington on behalf of Rosetta Commons, Johns Hopkins University may be entitled to a portion of Rosetta software license revenue, including some of the methods developed in this article. As a member of the Scientific Advisory Board, JJG has a financial interest in Cyrus Biotechnology. Cyrus Biotechnology distributes Rosetta software, which may include the methods mentioned in this article. JJG's arrangement has been reviewed and approved by Johns Hopkins University in accordance with its conflict of interest policy.

This article is directly contributed by PNAS. RTR is a guest editor invited by the editorial board.

This article contains online support information at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2107440118/-/DCSupplemental.

This open access article is distributed under the Creative Commons Attribution-Non-Commercial-No Derivative License 4.0 (CC BY-NC-ND).

Thank you for your interest in advertising on PNAS.

Note: We only ask you to provide your email address so that the people you recommend the page to know that you want them to see it, and that it is not spam. We do not capture any email addresses.

Feedback privacy/legal

Copyright © 2021 National Academy of Sciences. Online ISSN 1091-6490. PNAS is a partner of CHORUS, COPE, CrossRef, ORCID and Research4Life.